Skip to content

Enforce validDocIds consensus in upsert task generators and add includeBitmaps to validDocIdsMetadata API#18853

Open
deepthi912 wants to merge 8 commits into
apache:masterfrom
deepthi912:includeBitmapsInValidDocIdsMetadata
Open

Enforce validDocIds consensus in upsert task generators and add includeBitmaps to validDocIdsMetadata API#18853
deepthi912 wants to merge 8 commits into
apache:masterfrom
deepthi912:includeBitmapsInValidDocIdsMetadata

Conversation

@deepthi912

@deepthi912 deepthi912 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

Make the upsert task generators reject segments with inconsistent replicas before scheduling, instead of relying on the executor to fail those tasks after the fact. Concretely, this PR:

  1. Adds an opt-in includeBitmaps flag to the validDocIdsMetadata endpoint so the controller can batch-fetch each replica's validDocIds bitmap in one call.
  2. Applies the executor's validDocIds consensus checks (CRC match, server health, and EQUAL / UNSAFE / MOST_VALID_DOCS) in UpsertCompactionTaskGenerator and UpsertCompactMergeTaskGenerator, skipping any segment whose replicas disagree.
  3. Adds a validDocIdsValidationMode config (STRICT default / EXECUTOR_ONLY) to toggle the generator-side checks.

Follow-up to #17696 (which added the executor-side consensus), per this review comment.

Background

#17696 made the upsert compaction executor reconcile validDocIds across replicas before compacting — it checks CRC, server health, and a configurable validDocIdsConsensusMode (UNSAFE / EQUAL / MOST_VALID_DOCS), failing the task rather than letting a less-complete replica overwrite a more-complete one.

The generator, however, still scheduled a task for every eligible segment, even when its replicas were inconsistent (a CRC mismatch mid-reload, an unhealthy server, or divergent validDocIds). The executor then has to fail those tasks to stay safe — which wastes a segment download and a task slot every cycle, and the same segment keeps getting re-picked and re-failed.

Default Setting:

{
  "tableName": "myTable_REALTIME",
  "task": {
    "taskTypeConfigsMap": {
      "UpsertCompactionTask": {
        "schedule": "0 0 */6 * * ?",
        "validDocIdsConsensusMode": "EQUAL",
        "validDocIdsValidationMode": "STRICT"
      }
    }
  }
}

Backward compatibility

validDocIdsValidationMode defaults to STRICT, so generators now enforce consensus by default; set EXECUTOR_ONLY to restore the old behavior.

Testing

Unit tests cover each mode through the generators — EQUAL (agree / disagree / replica missing / CRC mismatch / unhealthy / missing bitmap), UNSAFE, MOST_VALID_DOCS — plus the config parsing/resolution helpers.

@codecov-commenter

codecov-commenter commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.66667% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.78%. Comparing base (dc4e957) to head (018b58d).
⚠️ Report is 11 commits behind head on master.

Files with missing lines Patch % Lines
...che/pinot/plugin/minion/tasks/MinionTaskUtils.java 68.33% 15 Missing and 4 partials ⚠️
...tcompactmerge/UpsertCompactMergeTaskGenerator.java 55.17% 11 Missing and 2 partials ⚠️
...psertcompaction/UpsertCompactionTaskGenerator.java 65.38% 8 Missing and 1 partial ⚠️
.../org/apache/pinot/core/common/MinionConstants.java 0.00% 2 Missing ⚠️
...upsertcompaction/UpsertCompactionTaskExecutor.java 0.00% 2 Missing ⚠️
...rtcompactmerge/UpsertCompactMergeTaskExecutor.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18853      +/-   ##
============================================
- Coverage     64.81%   64.78%   -0.03%     
- Complexity     1322     1343      +21     
============================================
  Files          3393     3393              
  Lines        211246   211402     +156     
  Branches      33208    33252      +44     
============================================
+ Hits         136917   136963      +46     
- Misses        63284    63367      +83     
- Partials      11045    11072      +27     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.78% <61.66%> (-0.03%) ⬇️
temurin 64.78% <61.66%> (-0.03%) ⬇️
unittests 64.78% <61.66%> (-0.03%) ⬇️
unittests1 56.99% <0.00%> (-0.04%) ⬇️
unittests2 37.17% <61.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@deepthi912 deepthi912 changed the title Add includeBitmaps option to validDocIdsMetadata endpoint Enforce validDocIds consensus in upsert task generators and add includeBitmaps to validDocIdsMetadata API Jun 26, 2026
@deepthi912 deepthi912 force-pushed the includeBitmapsInValidDocIdsMetadata branch from ef91178 to 8571ed6 Compare June 26, 2026 17:08
Mirror the executor's validDocIds enforcement at generation time so inconsistent
segments are never scheduled. UpsertCompactionTaskGenerator and
UpsertCompactMergeTaskGenerator now validate each segment's replicas (CRC match,
server health, and EQUAL/UNSAFE/MOST_VALID_DOCS consensus) via the shared
MinionTaskUtils.selectValidDocIdsMetadataForConsensus, requesting includeBitmaps
only for EQUAL and requiring all assigned replicas to respond for the strict
modes.

A new validDocIdsValidationMode config (STRICT default, EXECUTOR_ONLY) gates the
generator-side checks: STRICT runs them in both generator and executor;
EXECUTOR_ONLY downgrades the generator to a lenient pick and leaves the executor
as the sole gate.
@deepthi912 deepthi912 force-pushed the includeBitmapsInValidDocIdsMetadata branch from 8571ed6 to cac3f5e Compare June 26, 2026 17:45
@deepthi912 deepthi912 added upsert Related to upsert functionality configuration Config changes (addition/deletion/change in behavior) labels Jun 26, 2026

@xiangfu0 xiangfu0 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one high-signal issue; see inline comment.

if (consensusMode == MinionConstants.ValidDocIdsConsensusMode.EQUAL) {
ValidDocIdsMetadataInfo first = usableReplicas.get(0);
RoaringBitmap consensusBitmap = deserializeBitmapOrNull(first);
if (consensusBitmap == null) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes controller-first rolling upgrades incompatible under the new default STRICT/EQUAL path. Older servers still answer this endpoint but omit bitmap, and ValidDocIdsMetadataInfo explicitly treats that as expected for old servers; here we convert that mixed-version response into a hard skip, so upsert compaction/compact-merge task generation stops until every server is upgraded. Please keep the generator default executor-only, or add an old-server fallback before making bitmap-based prescheduling the default.

@deepthi912 deepthi912 added the backward-incompat Introduces a backward-incompatible API or behavior change label Jun 27, 2026
The consensus modes (EQUAL/MOST_VALID_DOCS) fetch validDocIds metadata from
every replica, and EQUAL also carries the serialized bitmap in each entry, so
reusing the regular numSegmentsBatchPerServerRequest (default 500) can produce
very large per-request payloads.

Add a shared validDocIdsConsensusFetchBatchSize knob (default 10) on
UpsertCompactionTask and route both the compaction and compact-merge generators
through MinionTaskUtils.resolveValidDocIdsFetchBatchSize: UNSAFE keeps the
regular batch, the consensus modes use the smaller consensus batch. The smaller
batch is intentional for the bitmap-bearing fetch and does not inherit a
user-set numSegmentsBatchPerServerRequest.
The consensus config (validDocIdsConsensusMode, validDocIdsValidationMode,
validDocIdsConsensusFetchBatchSize keys and defaults) is shared by the upsert
compaction, compact-merge, and segment-refresh tasks, so it belongs at the
top level of MinionConstants next to the ValidDocIdsConsensusMode and
ValidDocIdsValidationMode enums rather than nested under UpsertCompactionTask.

Update all references in both generators, both executors, MinionTaskUtils, and
the tests. Config key string values are unchanged. The two constants that
already shipped (VALID_DOC_IDS_CONSENSUS_MODE_KEY and its default) keep a
@deprecated forwarding alias at the old nested location for source/binary
compatibility.

@Jackie-Jiang Jackie-Jiang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not include bitmap check on the generator side. Comparing valid docs count should be good enough. Serializing back 500 (default batch size) bitmaps could be quite expensive, and likely not necessary. Valid docs count match should be good enough.

For CRC check, we should also improve it to allow data crc match. Take a look at BaseTableDataManager.hasSameCRC()

@deepthi912

Copy link
Copy Markdown
Collaborator Author

Yeah true I agree, in this case I changed it to 10 per batch but that is not scalable. So, going to just check the count.

@xiangfu0 xiangfu0 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a critical mixed-version issue; see inline comment.

if (consensusMode == MinionConstants.ValidDocIdsConsensusMode.EQUAL) {
ValidDocIdsMetadataInfo first = usableReplicas.get(0);
RoaringBitmap consensusBitmap = deserializeBitmapOrNull(first);
if (consensusBitmap == null) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still breaks mixed-version rolling upgrades. With the new defaults (EQUAL + STRICT), the generator requests includeBitmaps=true. Older servers ignore that query param and return ValidDocIdsMetadataInfo without bitmap, and this branch turns that into a hard skip, so controller-first upgrades stop scheduling upsert compaction / compact-merge tasks until all servers are upgraded or operators manually flip EXECUTOR_ONLY. Pinot normally needs controller/server roll-forward compatibility without per-table config changes, so this needs a fallback for older servers or the generator-side default needs to stay EXECUTOR_ONLY.

The generator-side EQUAL consensus check now requires every replica to report
the same valid doc count instead of comparing full validDocIds bitmaps.
Comparing counts avoids serializing a RoaringBitmap per replica back to the
controller, which is expensive for large upsert tables. The executor remains
the authoritative gate and still verifies byte-identical bitmaps before
compacting, so a count match that hides a set difference is scheduled-then-
failed there rather than mis-compacted.

This rolls back the now-unnecessary generator bitmap-fetch machinery added
earlier on this branch (all unreleased): the includeBitmaps validDocIdsMetadata
endpoint param, the ValidDocIdsMetadataInfo bitmap field, the
ServerSegmentMetadataReader overload, and the validDocIdsConsensusFetchBatchSize
config knob. Generators go back to the regular per-server fetch batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backward-incompat Introduces a backward-incompatible API or behavior change configuration Config changes (addition/deletion/change in behavior) upsert Related to upsert functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants